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In quantum computation every unitary operation can be decomposed into quantum circuits - a series of 
single-qubit rotations and a single type entangling two-qubit gates, such as Controlled-NOT (C-NOT) gates. 
Two measures are important when judging the complexity of the circuit: the total number of C-NOT gates 
needed to implement it and the depth of the circuit, measured by the minimal number of computation steps 
needed to perform it. Here we give an explicit and simple quantum circuit scheme for preparation of arbitrary 
quantum states, which can directly utilize any decomposition scheme for arbitrary full quantum gates, thus 
connecting the two problems. Our circuit reduces the depth of best currently known circuit by a factor of 2. It 
also reduces the total number of C-NOT gates from 2" to §|2" in the leading order for even number of qubits. 
Specifically, the scheme allows to decrease the upper bound from 1 1 C-NOT gates to 9 and the depth from 1 1 to 
5 steps for four qubits. Our results are expected to help in designing and building small-scale quantum circuits 
using present technologies. 



PACS numbers: 03.67 .Ac,42.50.Dv 
INTRODUCTION 

Quantum information and computation theory ([ 1 1 and ref- 
erences therein) is receiving attention in last decades due to 
its possibility to outperform information processing based on 
the classical physics in the areas of secure communication |2| 
or efficient implementation of certain computation tasks, e.g., 
prime number factorization . 

Similarly to classical computation, every quantum compu- 
tation, represented as a unitary operation performed on a de- 
sired state of qubits, can be decomposed into small operation 
blocks, where only a subset of qubits is changed non-trivially. 
Whereas one-qubit operations cannot be composed to a gen- 
eral unitary operation, as they never change the degree of en- 
tanglement within the state, a single type of two-qubit opera- 
tion (for example, C-NOT H) in combination with arbitrary 
one-qubit rotations suffices [5|. 

The complexity of quantum circuits is usually measured 
in the number of C-NOT gates needed to perform the de- 
sired unitary operation. The reason to count the number of 
two-qubit gates is mainly experimental since their realization 
is much more demanding and introduces more imperfections 
than the realization of one-qubit gates. Adding every new C- 
NOT to the circuit increases its overall imperfection. This 
constitutes the main obstacle preventing realization of quan- 
tum computation within sufficient precision. It is therefore 
crucial to design circuits with the least possible number of en- 
tangling gates. 

In general, an exponential number of C-NOT gates with re- 
spect to the number of qubits involved is needed to imple- 
ment a general unitary operation. This can be seen by simple 
counting of parameters of an n-qubit unitary operation. Sev- 
eral attempts have been made to optimize the number of gates 
needed for general operations I16 HT31 . 



In situations where the input for a quantum computer or a 
quantum communication protocol is a known quantum state, 
we are not interested to perform a completely defined unitary 
transformation. Instead, we aim only to prepare a given state 
\<f>), i.e. to perform a transformation from an initial state \iff) 
to a different target state: — > \<p), where a whole class of 
unitaries U fulfills the condition U \4>) = \<p). 



It is known that one needs an exponential number of C-NOT 
gates to prepare a generic quantum state, i.e., in the leading or- 
der this number is Nc-not = c ■ 2" , where c is a pre-factor 
and n is the number of qubits. Any optimization can only 
decrease the pre-factor but cannot beat the exponential depen- 
dence. The best known result so far is c — 1 Q. Here we give 
an explicit quantum circuit reducing the pre-factor to c = 1| 
for n even. Specifically, using our scheme we decrease the 
known upper bound from 1 1 C-NOT gates to 9 for four qubits 
and from 57 C-NOT gates down to 46 for six qubits, keeping 
the existing bound of 26 C-NOT gates for five qubits. The 
lower bounds are 6, 13 and 29 C-NOT gates respectively (see 
below). 



The reduction of the overall number of C-NOT gates might 
be, however, not the only aim of the optimization procedure. 
Searching for efficient algorithms, the depth of the quantum 
circuit, i.e. the minimal number of computation steps required 
for accomplishing the computation, is crucial |16|. In a gen- 
eral case, the depth might be as high as the overall number of 
C-NOT gates, not allowing to perform more than one gate in 
parallel as is the case in Ref. 0. In our scheme the depth is 
at most half of the number of C-NOT gates, i.e. at least two 
gates can be implemented in parallel in every step. 
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LOWER BOUNDS 

A general «-qubit pure state is fully described by 2" +1 - 2 
real parameters. During the preparation process, these pa- 
rameters are introduced sequently by performing single-qubit 
rotations (each rotation introduces three Euler angles) along 
with C-NOT gates. C-NOT gates as such do not introduce any 
parameters, but they are a kind of barriers that separate one- 
qubit rotations such that they cannot merge into a resulting 
single rotation for each qubit. Naively, one could expect that 
every C-NOT gate can be accompanied with two one-qubit 
operations - one for the control and one for the target qubit - 
applied after every C-NOT gate. Due to existing identities [6|, 
however, only four real parameters can be introduced with one 
C-NOT gate. This can be understood as follows: Rotation 
about z axis applied on the control qubit commutes with the 
C-NOT gate. Similarly, rotation about x axis applied on the 
target qubit commutes with the C-NOT gate. In this way, the 
two types of rotations can be commuted backward through the 
C-NOT gate and combined with the rotations applied after the 
previous C-NOT gates acting on the respective qubits. Thus 
for every C-NOT we can implement four real parameters of 
the desired state. 

Further parameters can be added by local unitary transfor- 
mations on qubits in the beginning of the process. Trivially, 
one would expect to introduce three real parameters per qubit, 
corresponding to three Euler angles. However, this is not the 
case. By starting in a specific product state (e.g. |O) 0Ar ), we 
may only rotate every single qubit into a given direction, what 
gives us two parameters per qubit. The third, missing param- 
eter, is just a phase on every qubit, which sums up through all 
qubits and influences only the global phase. Therefore on n 
qubits, with k C-NOT gates, we may introduce altogether up 
to 4k + 2n real parameters. This gives a lower bound on the 
number of C-NOT gates needed to prepare a state: 6 for four 
qubits, 13 for five qubits and 29 for six qubits. For large num- 
bers of n we get a lower bound on the number of C-NOT gates 
k — 1 2" in the leading order. 

The lower limit for the depth of the circuit also grows expo- 
nentially with the number of qubits, with a linear correction. 
This can be seen from the fact that in one computation step 
no more than | C-NOT gates can be performed. The only 
possible optimization for the depth is also the reduction of the 
pre-factor with up to a linear correction, with the lower bound 

n 

FOUR QUBITS 

The Hilbert space of four qubits can be factorized into two 
parts, where each part is associated to two qubits. An arbitrary 
pure state |*T) of four qubits can then be expressed using the 
(standard) Schmidt decomposition as 

4 

m = 2 J a i mw> i . (i) 



Here \tff) h i = 1, ...,4, are four normalized orthogonal states of 
the first two qubits and similarly are four normalized or- 
thogonal states of the second two qubits. The states are given 
with a nontrivial global phase. The coefficients a,- are real and 
positive and they obey Y^=\ Off = 1. Without the loss of gen- 
erality we can rewrite the decomposition ([T]i in such a way 
that \if/)i and will be defined only up to a global phase. 
Their relative phases (with respect to different i's) will then 
be included in the generalized coefficients or,, which become 
complex. As we are interested in \W) up to its global phase, 
we can make the choice of having a\ real positive. 

The pure state \¥) is specified by 2 5 - 2 = 30 real param- 
eters. The four states are specified by 6, 4, 2 and pa- 
rameters (due to orthogonality condition), and so are the four 
states \(f>)j. The four coefficients <?, require 6 independent real 
parameters to be determined due to normalization condition 
and the choice of the global phase. This gives altogether 30 
parameters, as expected. 

Phase 1 

To prepare the state \*¥) starting from the initial state |0000), 
we first generate the state with the generalized (complex) 
Schmidt coefficients on the first two qubits: 

10000) -> («i |00> + a 2 |01) + a 3 110> + or 4 |11» |00> . (2) 

This operation does not define a unitary operation completely, 
but is a state-preparation operation on two qubits (starting 
from a known state |00) we end in a state specified by the 
generalized Schmidt decomposition coefficients). Therefore, 
as shown in Ref. fTTl . it can be realized by one C-NOT oper- 
ation in combination with suitable one-qubit rotations. 

Phase 2 

We perform two C-NOT operations, one with the control 
on the first qubit and the target on the third qubit and the other 
one with the control on the second qubit and the target on the 
fourth qubit. In such a way we can "copy" the basis states 
of the first two qubits onto the respective states of the second 
two qubits. In this way we obtain a state of four qubits, which 
has the same Schmidt decomposition coefficients as the target 
state ([T). 

(ori |00) + a 2 |01) + a 3 1 10) + a 4 |11» |00> -> (3) 
-> («! |00) |00) + a 2 101) |01) + as |10) |10) + a 4 |H> |H» ■ 

For this phase we obviously only need two C-NOT operations; 
one-qubit rotations are not necessary. 

Phase 3 

Keeping the Schmidt decomposition form we apply the uni- 
tary operation that transforms the basis states of the first two 
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FIG. 1: Gate sequence for preparation of an arbitrary four-qubit state. 
Individual one-qubit rotations (not depicted) need to be applied be- 
tween C-NOT gates. The four individual phases are described in the 
text. The two C-NOT gates in phase 2, as well as in the phases 3 and 
4 can be performed in parallel, as they address different qubits. Alto- 
gether one needs 9 C-NOT gates, 4 pairs of which can be performed 
in parallel. 



qubits into the four states We obtain: 



|00> -» l-A)! 
HO) -» |«A> 3 



|01) -» |<A> 2 (4) 
|11> — > |*A> 4 . 



As for any two-qubit unitary operation we do not need more 
than 3 C-NOT gates El. 



Phase 4 



In the final phase of the circuit we perform a unitary op- 
eration on the third and fourth qubit in order to transform 
their computational basis states into the Schmidt basis states 
ofEq. ([!}: 



100) -» |0)! 

HO) -» \<t>) 3 



101) -» \<P) 2 (5) 

|ii>-H*> 4 . 



FIVE QUBITS 

To illustrate our state preparation procedure for the case of 
odd number of qubits - where the entire Hilbert space cannot 
be factorized into Hilbert spaces of equal dimensions - we 
give an example for five qubits. We first factorize the Hilbert 
space into two parts, with one part associated with two qubits 
and the other one with three qubits. The Schmidt decomposi- 
tion of an arbitrary five-qubit state with respect to such Hilbert 
space factorization has almost the same structure as in the case 
of four qubits in Eq. ([TJ. One has 



(6) 



Again, the summation goes at most over four terms and the 
only difference is that states \<f>) it i = 1,...,4, are now three- 
qubit states. We again choose to include the relative phase of 
the states into the coefficients a, and proceed with the phases 
one to three in the same way as for four qubits. The only 
difference is in the fourth phase, where we perform a three- 
qubit unitary operation: 



100) |0>-> |0) i 

|10>|0>-» |0> 3 



101) |0) 
|11> |0> — 



*I0>2 
104 • 



(7) 



Such unitary can be implemented by no more than 20 C-NOT 
gates IfTTTl . Moreover, this unitary is not completely defined 
(the third qubit is initially exclusively in the state |0)) and thus 
further reduction of the number of C-NOT gates might be pos- 
sible. Even without such optimizations, our state preparation 
procedure for five qubits requires 1+2 + 3 + 20 = 26 C-NOT 
gates, which achieves result of Ref. [9 1. The lover limit of 13 
C-NOT gates suggests that further optimization is possible. 

The depth of the procedure is 22 computation steps, with 
one step for phase one, one step for phase two and 20 steps 
for performing the phases three and four in parallel. This is 
less than the lowest known depth of 26 of Ref. [9|, but more 
than the theoretical lower bound of 7. 



GENERAL CASE 



Similarly to the previous phase, we again use 3 C-NOT opera- 
tions. We conclude that altogether we have used 1+2 + 3 + 3 = 
9 C-NOT gates for the entire quantum state preparation cir- 
cuit, which is less than the best result of 11 C-NOT gates, 
which can be deduced from Q. However, it stays above the 
minimum of 6 gates obtained from parameter counting. 

The depth of the circuit is 5, where the second phase can be 
done in one computation step and the third and fourth phases 
can be done in parallel in three computation steps. This is less 
than half of the result of Ref. @ and is optimal for 9 C-NOT 
gates. The theoretical minimal depth is 3, deduced from the 
fact that at least 6 C-NOT gates are needed and no more than 
two can be performed in one step. 



We will now apply the main idea presented for four and 
five qubits to the general case of n qubits. We begin with 
factorization of the Hilbert space of n qubits into two parts of 
equal dimension for n even, so that each part is associated to 
| qubits. For odd number of qubits we factorize the Hilbert 
space into ^ and ^ qubits. On the first part of the qubits 
we will prepare a state whose amplitudes in the computational 
basis will be defined by the generalized Schmidt coefficients. 
Then we will apply a set of C-NOT gates between the qubits 
in the first and the second part. In the end we will perform two 
unitary operations, one on the first part and one on the second 
part of qubits. We will separately treat the case of even and 
odd number of qubits. 
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Even number of qubits 

We write the number of qubits as n = 2k. The qubits are di- 
vided into two parts, each containing k qubits. With respect to 
this division the Schmidt decomposition of an arbitrary state 
of n qubits has the following form 



(8) 



i=\ 



where both \tj/) i and \(p) i are normalized states of k qubits and 
a, are complex coefficients. 

The initial state of qubits is assumed to be the product state 
I0)® 2 * in which each qubit is in state |0). On the first k qubits 
we prepare a superposition state whose amplitudes are the 
Schmidt coefficients in the computational basis: 



I 2" 



\0f 



V 1=1 



|0> g 



(9) 



The sequence of O's and 1 's in the computational basis states 
(I00...0), |00...1), |1 represents the binary encoding of 
the index i in the states Qubits in state |1) stand exactly 
on those positions where there is a 1 in the binary notation of 
i. All other qubits are in the state |0). To prepare a state on k 
qubits as required in Eq. d§), we can utilize the existing bound 
from Ref. (9), which allows us to prepare it with help of 2 k - 
k- 1 C-NOT gates. We will later return to the discussion about 
further optimization possibilities of this particular phase. 

In the second phase, we perform k C-NOT gates with qubits 
j as the control and qubits j+k as the target for /' running from 
1 to k. This will bring us to the desired Schmidt form of our 
state 



( 2* 



V/=i 



2 <*i 10 |0>®*^JV;|0I0. 



(10) 



The phases three and four are fc-qubit unitary operations 
performed on the first and second half of qubits respectively. 
We obtain 



This is the new lowest number of C-NOT gates needed for 
construction of a universal circuit for preparation of an arbi- 
trary state. 

In the first phase ([9]) of the procedure given above we used a 
method for state preparation, which requires more entangling 
gates than our method. Naturally, we can use our result recur- 
sively to obtain a slightly lower number of C-NOT operations 
needed to prepare the state of the first k qubits. However, this 
part of the process does not contribute to the leading order of 
the number of C-NOT gates needed for preparation as calcu- 
lated in Eq. (13 1. The first phase contributes only with the 



order of 2 5 , whereas the phases three and four contribute with 
the order of 2" . 

The depth of the circuit is, in the leading order, given by the 



depth of the phases three and four, which is |f 2", less than the 
best previous result of 2", but weaker than the theoretical limit 
of 2 -. 



Odd number of qubits 

We express the number of qubits as n = 2k + 1. The first 
three phases, as described by equations ( 9|10|11 1 of the proce- 
dure remain exactly the same as for the case of even number of 



qubits. In the phase four ( 12 1 we perform a unitary operation 
on k + 1 instead of k qubits. Summing up contributions from 
all four phases we obtain the overall number of C-NOT gates 
required to be 2 k -k- 1 +k+ \\2 lk - \2 k +\ + |§2 2jt+2 — §2 i+1 + f . 
Similarly to the previous case the leading order of this sum 
bounds the number of the C-NOT gates from above. It can 
be simplified to N' 



odd 
CNOT 



115 on 
96 z ' 



This result is weaker then 
the bound ([T3|) for even number of qubits. However, further 



optimizations are possible since in phase four the operation 
required is not a completely defined unitary and one does not 
necessarily need the whole number of C-NOT gates as re- 
quired for a general unitary rotation on k+ 1 qubits. Moreover, 
even in this case the depth of the circuit bounded by \^2" is 
smaller than the best known result. 



CONCLUSIONS 



2*il«>IO-»2 flri W* l<> 

i=i ;=i 

2* 



which is the aimed target state ((HI). Every unitary operation 
acting on k qubits can be performed by ||2 — 1 2 + | C- 
NOT gates [11|. We thus need altogether 2 k - k - 1 + k + 

23 9 2/t_ 39/t+l _j_ 8 



We thus need altogether 2 - k 
3 C-NOT gates. This number is bounded from 



24^ 2 

above by its leading term in k. Taking n 



2k we obtain 



CNOT 



23 
24" 



2". 



(11) We give an explicit and efficient circuit for preparation of 
arbitrary states of n qubits using a gate library consisting of 
a single two-qubit gate (C-NOT) and one-qubit rotations. For 

(12) even number of qubits we have slightly reduced the previously 
known upper bound on the number of C-NOT gates needed. 
For the special case of four qubits our scheme requires only 
9 C-NOT gates (compared to 11 previously known), which 
should be within the scope of near future quantum technology. 

Our quantum state preparation scheme provides also a 
lower computational depth than the previously known results. 
It can be divided into four phases, where the last two can be 
performed in parallel, which leads to roughly half of compu- 

(13) 

tational steps comparing to the previous results. This opens 
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further optimization possibilities for experimental implemen- 
tation of the state preparation. Our results can help in design- 
ing and building small-scale quantum circuits using present 
technologies (see, e.g., Refs. lfT71[T8l ). 

Our procedure introduces a conceptually simple utilization 
of efficient decomposition of arbitrary quantum gates for the 
problem of state preparation. In fact, the efficiency of our pro- 
cedure is based on the best results for gate decompositions. If 
better results will be obtained in future, they will directly lead 
to lowering of our bounds. Moreover, this utilization itself is 
very efficient: a circuit for gate decomposition reaching the 
lower bound of 4 ( "~ 2) [6 1 C-NOT gates in leading order would 
lead to state preparation with 2 ( "~ 1) C-NOT gates, reaching the 
lower bound in the leading order as well. 

Using our scheme one can also efficiently apply operations 
that transform any given state \iff) of n qubits to any other given 
state We first run the preparation procedure for \\fj) in 
the reversed order, which results in the state 10)®". Then, we 
continue with preparing the aimed state \<f>). The number of C- 
NOT gates needed to perform this composite transformation 
is just double the number needed to prepare an arbitrary state 
from |0)®". However, the depth of the complete circuit is even 
less than double, as the last phase of the reversed process and 
the first phase of the preparation process can run on distinct 
qubits and therefore be performed in parallel. 

ACKNOWLEDGMENTS 

This work was supported by the Action Austria - Slovakia, 
SoMoPro project SIGA 862, the Austrian Science Founda- 
tion FWF within Projects No. P19570-N16, SFB and Co- 
QuS No. W1210-N16, and the European Commission Project 
QESSENCE. The collaboration is a part of OAD/APVV SK- 
AT-0015-10 project. 



[1] M. A. Nielsen and I. L. Chuang, Quantum Computation and 
Quantum Information, Cambridge University Press (2000). 



[2] C. H. Bennett and G. Brassard, Proceedings of the IEEE Inter- 
national Conference on Computers, Systems, and Signal Pro- 
cessing, Bangalore, 175 (1984). 

[3] P. Shor, SIAM J. Sci. Statist. Comput. 41, 303 (1999). 

[4] C-NOT, or Controlled-NOT operation reverts the state of the 
target qubit if and only if the control qubit is in the state |1). 
It can be described in the computation basis as |00> — > |00), 
|01) -> |01>, |10) -> |11) and |11) -> |10), where e.g. |00) 
denotes a product state in which both qubits are in |0) state. 

[5] A. Barenco, C. H. Bennett, R. Cleve, D. P. DiVincenzo, N. Mar- 
golus, P. Shor, T. Sleator, J. A. Smolin, and H. Weinfurter, Phys. 
Rev. A 52, 3457 (1995). 

[6] V. V. Shende, I. L. Markov, and S. S. Bullock, Phys. Rev. A 69, 
062321 (2004). 

[7] J. J. Vartiainen, M. Mottonen, and M. M. Salomaa, Phys. Rev. 
Lett. 92, 177902 (2004). 

[8] M. Mottonen, J. J. Vartiainen, V. Bergholm, and M. M. Salo- 
maa, Phys. Rev. Lett. 93, 130502 (2004). 

[9] M. Mottonen, J. J. Vartiainen, V. Bergholm, and M. M. Salo- 
maa, Quant. Inf. Comp. 5, 467 (2005); V. Bergholm, J. J. Var- 
tiainen, M. Mottonen, and M. M. Salomaa, Phys. Rev. A 71, 
052330 (2005). 

[10] V. V. Shende and I. L. Markov, Quant. Inf. Comput. 5, 49 
(2005); M.-Y. Ye, D. Sun, Y.-S. Zhang, and G.-C. Guo, Phys. 
Rev. A 70, 022326 (2004). 

[11] M. Mottonen and J. J. Vartiainen, Trends in quantum Comput- 
ing Research, NOVA Publishers (2006); V. Shende, S. Bullock, 
and I. Markov, IEEE Transactions on Computer-Aided Design 
25, 1000 (2006). 

[12] Y. Nakajima, Y. Kawano, and H. Sekigawa, Quant. Inf. Comp. 
6, 67 (2006). 

[13] B. Drury and P. Love, JPhA 41, 395305 (2008). 

[14] M. Sedlak and M. Plesch, CEIP 6, 128 (2008). 

[15] M. Saeedi, M. Arabzadeh, M. S. Zamani, and M. Sedighi, 

quant-ph arXiv:1011.2159 (2010). 
[16] As a "computation step" only the performance of C-NOT gates 

is considered. Several C-NOT gates performed in parallel are 

considered as one step. 
[17] M. Riebe, K. Kim, P. Schindler, T. Monz, P. O. Schmidt, T. K. 

Korber, W. Hansel, H. Haffner, C. F. Roos, and R. Blatt, Phys. 

Rev. Lett. 97, 220407 (2006). 
[18] R. Prevedel, G. Cronenberg, M. S. Tame, M. Patemostro, P. 

Walther, M. S. Kim, and A. Zeilinger, Phys. Rev. Lett. 103, 

020503 (2009). 



